The first two chunks of this r markdown file after the r setup allow for plot zooming, but it also means that the html file must be opened in a browser to view the document properly. When it knits in RStudio the preview will appear empty but the html when opened in a browser will have all the info and you can click on each plot to Zoom in on it.

Before you begin

Notes

A few notes about this script.

If you are running this with the 2022-2023 data make sure you download the whole (OSM_2022-2023 GitHub repository)[https://github.com/ACMElabUvic/OSM_2022-2023] from the ACMElabUvic GitHub. This will ensure you have all the files, data, and proper folder structure you will need to run this code and associated analyses.

Also make sure you open RStudio through the R project (OSM_2022-2023.Rproj) this will automatically set your working directory to the correct place (wherever you saved the repository) and ensure you don’t have to change the file paths for some of the data.

Lastly, if you are looking to adapt this code for a future year of data, you will want to ensure you have run the 2_ACME_landscape_covariate_exploration_script.Rmd with your data as there is much data formatting, cleaning, and restructuring that has to be done before this code will work. Helpful note: The files are numbered in the order they are used for this analysis.

If you have question please email the most recent author, currently

Marissa A. Dyck
Postdoctoral research fellow
University of Victoria
School of Environmental Studies
Email: marissadyck17@gmail.com

(update/add authors as needed)

Install packages

If you don’t already have the following packages installed, use the code below to install them.

install.packages('tidyverse') 
install.packages('ggpubr')
install.packages('corrplot')
install.packages('Hmisc')
install.packages('glmmTMB')
install.packages('MuMIn')
install.packages('TMB', type = 'source')
install.packages('rphylopic')
install.packages('broom')

Load libraries

Then load the packages to your library.

library(tidyverse) # data tidying, visualization, and much more; this will load all tidyverse packages, can see complete list using tidyverse_packages()
library(ggpubr) # make modificaions to plot for publication (arrange plots)
library(PerformanceAnalytics)    #Used to generate a correlation plot
library(Hmisc) # used to generate histograms for all variables in data frame
library(glmmTMB)      #Constructing GLMMs
## Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
## TMB was built with Matrix version 1.4.1
## Current Matrix version is 1.5.3
## Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
library(MuMIn) # for model selection
library(rphylopic) # add animal silhouettes to graphs
library(broom) # extracting odds ratios in a tidy format

Summary statistics

Load detection data

Read in saved and cleaned detection data for the 6 LUs from 2021-2022 and 2022-2023 e.g., the 1_ACME_camera_script_9-2-2024.R.

These are two separate files from the two different fiscal years, so they need to be imported and then merged into one data file for plotting. Since both are stored in the data/processed/ folder we can read them both in as a list with purrr.

# detection data
# read in saved and cleaned detection data from the ACME_camera_script_9-2-2024.R 
detections <- file.path('data/processed',
                        
                        c('OSM_ind_det_2021.csv',
                          'OSM_ind_det_2022.csv')) %>% 
  
 map(~.x %>%
        read_csv(.) %>% 
       
       # ensure the array, site, species, and event_id read in as factors
       mutate_if(is.character,
                 as.factor)) %>% 
  
  # give names to each data frame in list
  purrr::set_names('dets_2021',
                   'dets_2022') # R doesn't like when they are just numbers, you can make it work but it's annoying to call the data frame later so I've called them dets_year
## Rows: 6696 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): array, site, species, event_id
## dbl  (3): month, year, timediff
## dttm (1): datetime
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 14055 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): array, site, species, event_id
## dbl  (3): month, year, timediff
## dttm (1): datetime
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(detections)
## List of 2
##  $ dets_2021: tibble [6,696 × 8] (S3: tbl_df/tbl/data.frame)
##   ..$ array   : Factor w/ 2 levels "LU2","LU3": 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ site    : Factor w/ 78 levels "LU2_03","LU2_05",..: 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ species : Factor w/ 35 levels "Arctic hare",..: 28 28 35 35 35 35 35 35 35 35 ...
##   ..$ datetime: POSIXct[1:6696], format: "2021-07-15 11:40:07" "2022-02-06 15:11:06" ...
##   ..$ month   : num [1:6696] 7 2 7 8 8 8 8 8 8 9 ...
##   ..$ year    : num [1:6696] 2021 2022 2021 2021 2021 ...
##   ..$ timediff: num [1:6696] NA 296839 NA 28519 13230 ...
##   ..$ event_id: Factor w/ 6696 levels "E1000000","E1000001",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ dets_2022: tibble [14,055 × 8] (S3: tbl_df/tbl/data.frame)
##   ..$ array   : Factor w/ 4 levels "LU01","LU13",..: 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ site    : Factor w/ 155 levels "LU01_06","LU01_10",..: 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ species : Factor w/ 39 levels "ATVer","Beaver",..: 31 31 38 38 38 38 38 38 38 38 ...
##   ..$ datetime: POSIXct[1:14055], format: "2022-06-17 10:01:52" "2023-09-10 12:51:15" ...
##   ..$ month   : num [1:14055] 6 9 6 7 7 7 8 8 8 8 ...
##   ..$ year    : num [1:14055] 2022 2023 2022 2022 2022 ...
##   ..$ timediff: num [1:14055] NA 648166 NA 31847 21429 ...
##   ..$ event_id: Factor w/ 14055 levels "E0","E1","E10",..: 1 2 5168 6279 7390 8501 9612 10723 11834 12945 ...

Merge data

Now we need to merge these two files to make plotting easier.

They have the same columns so we could just the base R rbind() function, but in case there are differences in the columns in the future, let’s use the cleaner dplyr::bind_rows() function

# Join two years of detection data
detections_merged <- dplyr::bind_rows(detections$dets_2021,
                                      detections$dets_2022)

# check new data
str(detections_merged)
## tibble [20,751 × 8] (S3: tbl_df/tbl/data.frame)
##  $ array   : Factor w/ 6 levels "LU2","LU3","LU01",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ site    : Factor w/ 233 levels "LU2_03","LU2_05",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ species : Factor w/ 42 levels "Arctic hare",..: 28 28 35 35 35 35 35 35 35 35 ...
##  $ datetime: POSIXct[1:20751], format: "2021-07-15 11:40:07" "2022-02-06 15:11:06" ...
##  $ month   : num [1:20751] 7 2 7 8 8 8 8 8 8 9 ...
##  $ year    : num [1:20751] 2021 2022 2021 2021 2021 ...
##  $ timediff: num [1:20751] NA 296839 NA 28519 13230 ...
##  $ event_id: Factor w/ 20751 levels "E1000000","E1000001",..: 1 2 3 4 5 6 7 8 9 10 ...

Data checks

Let’s check to make sure the data looks good after merging

Arrays and sites

Let’s check that there are the correct number of levels for array and sites, there should be 6 arrays and 233 sites (155 from 2022-2023 and 78 from 2021-2022)

str(detections_merged)
## tibble [20,751 × 8] (S3: tbl_df/tbl/data.frame)
##  $ array   : Factor w/ 6 levels "LU2","LU3","LU01",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ site    : Factor w/ 233 levels "LU2_03","LU2_05",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ species : Factor w/ 42 levels "Arctic hare",..: 28 28 35 35 35 35 35 35 35 35 ...
##  $ datetime: POSIXct[1:20751], format: "2021-07-15 11:40:07" "2022-02-06 15:11:06" ...
##  $ month   : num [1:20751] 7 2 7 8 8 8 8 8 8 9 ...
##  $ year    : num [1:20751] 2021 2022 2021 2021 2021 ...
##  $ timediff: num [1:20751] NA 296839 NA 28519 13230 ...
##  $ event_id: Factor w/ 20751 levels "E1000000","E1000001",..: 1 2 3 4 5 6 7 8 9 10 ...
# check the levels for array and sites
levels(detections_merged$array)
## [1] "LU2"  "LU3"  "LU01" "LU13" "LU15" "LU21"
levels(detections_merged$site)
##   [1] "LU2_03"   "LU2_05"   "LU2_100"  "LU2_101"  "LU2_106"  "LU2_110" 
##   [7] "LU2_119"  "LU2_123"  "LU2_13"   "LU2_137"  "LU2_143"  "LU2_15"  
##  [13] "LU2_153"  "LU2_155"  "LU2_17"   "LU2_18"   "LU2_20"   "LU2_21"  
##  [19] "LU2_24"   "LU2_25"   "LU2_26"   "LU2_27"   "LU2_29"   "LU2_30"  
##  [25] "LU2_31"   "LU2_32"   "LU2_33"   "LU2_34"   "LU2_36"   "LU2_38"  
##  [31] "LU2_39"   "LU2_40"   "LU2_42"   "LU2_44"   "LU2_46"   "LU2_47"  
##  [37] "LU2_50"   "LU2_51"   "LU2_52"   "LU2_54"   "LU2_56"   "LU2_57"  
##  [43] "LU3_05"   "LU3_07"   "LU3_10"   "LU3_102"  "LU3_103"  "LU3_111" 
##  [49] "LU3_126"  "LU3_129"  "LU3_13"   "LU3_131"  "LU3_137"  "LU3_14"  
##  [55] "LU3_147"  "LU3_15"   "LU3_17"   "LU3_18"   "LU3_19"   "LU3_20"  
##  [61] "LU3_21"   "LU3_25"   "LU3_27"   "LU3_28"   "LU3_32"   "LU3_36"  
##  [67] "LU3_38"   "LU3_39"   "LU3_40"   "LU3_41"   "LU3_44"   "LU3_45"  
##  [73] "LU3_46"   "LU3_48"   "LU3_49"   "LU3_50"   "LU3_51"   "LU3_52"  
##  [79] "LU01_06"  "LU01_10"  "LU01_11"  "LU01_13"  "LU01_22"  "LU01_25" 
##  [85] "LU01_27"  "LU01_30"  "LU01_32"  "LU01_36"  "LU01_40"  "LU01_41" 
##  [91] "LU01_43"  "LU01_44"  "LU01_45"  "LU01_46"  "LU01_47"  "LU01_48" 
##  [97] "LU01_60"  "LU01_63"  "LU01_64"  "LU01_66"  "LU01_67"  "LU01_70" 
## [103] "LU01_71"  "LU01_72"  "LU01_73"  "LU01_74"  "LU01_75"  "LU01_76" 
## [109] "LU01_77"  "LU01_78"  "LU01_79"  "LU01_80"  "LU01_82"  "LU01_83" 
## [115] "LU01_84"  "LU01_85"  "LU01_86"  "LU13_03"  "LU13_05"  "LU13_06" 
## [121] "LU13_08"  "LU13_11"  "LU13_12"  "LU13_128" "LU13_13"  "LU13_131"
## [127] "LU13_14"  "LU13_15"  "LU13_16"  "LU13_17"  "LU13_18"  "LU13_19" 
## [133] "LU13_20"  "LU13_21"  "LU13_22"  "LU13_26"  "LU13_27"  "LU13_30" 
## [139] "LU13_32"  "LU13_33"  "LU13_34"  "LU13_35"  "LU13_36"  "LU13_37" 
## [145] "LU13_38"  "LU13_41"  "LU13_43"  "LU13_45"  "LU13_47"  "LU13_49" 
## [151] "LU13_51"  "LU13_52"  "LU13_53"  "LU13_55"  "LU13_56"  "LU13_57" 
## [157] "LU13_59"  "LU13_70"  "LU15_01"  "LU15_02"  "LU15_03"  "LU15_04" 
## [163] "LU15_07"  "LU15_08"  "LU15_09"  "LU15_10"  "LU15_11"  "LU15_12" 
## [169] "LU15_14"  "LU15_15"  "LU15_16"  "LU15_17"  "LU15_18"  "LU15_19" 
## [175] "LU15_20"  "LU15_21"  "LU15_22"  "LU15_23"  "LU15_24"  "LU15_25" 
## [181] "LU15_26"  "LU15_27"  "LU15_28"  "LU15_29"  "LU15_30"  "LU15_31" 
## [187] "LU15_32"  "LU15_34"  "LU15_36"  "LU15_37"  "LU15_40"  "LU15_41" 
## [193] "LU15_43"  "LU15_44"  "LU15_46"  "LU15_58"  "LU15_61"  "LU21_06" 
## [199] "LU21_09"  "LU21_10"  "LU21_100" "LU21_105" "LU21_106" "LU21_107"
## [205] "LU21_109" "LU21_114" "LU21_116" "LU21_119" "LU21_122" "LU21_126"
## [211] "LU21_14"  "LU21_153" "LU21_16"  "LU21_164" "LU21_21"  "LU21_23" 
## [217] "LU21_27"  "LU21_32"  "LU21_36"  "LU21_41"  "LU21_52"  "LU21_56" 
## [223] "LU21_57"  "LU21_59"  "LU21_63"  "LU21_68"  "LU21_74"  "LU21_78" 
## [229] "LU21_82"  "LU21_871" "LU21_93"  "LU21_97"  "LU21_98"

Everything looks good

NAs

Let’s ensure no NAs were introduced where there shouldn’t be during the joining process

# check for NAs introduced during data merge
summary(detections_merged)
##   array           site                    species    
##  LU2 :4711   LU01_47:  405   White-tailed deer:6141  
##  LU3 :1985   LU01_84:  396   Snowshoe hare    :4571  
##  LU01:6275   LU01_66:  383   Red squirrel     :2199  
##  LU13:1972   LU01_27:  320   Black bear       :1997  
##  LU15:2951   LU2_39 :  310   Coyote           :1285  
##  LU21:2857   LU2_50 :  292   Moose            : 693  
##              (Other):18645   (Other)          :3865  
##     datetime                          month             year     
##  Min.   :1979-12-31 23:00:00.00   Min.   : 1.000   Min.   :1979  
##  1st Qu.:2022-04-24 21:32:40.00   1st Qu.: 5.000   1st Qu.:2022  
##  Median :2022-11-03 19:10:30.00   Median : 8.000   Median :2022  
##  Mean   :2022-10-08 23:31:30.77   Mean   : 7.205   Mean   :2022  
##  3rd Qu.:2023-05-05 02:17:01.50   3rd Qu.:10.000   3rd Qu.:2023  
##  Max.   :2023-10-02 11:08:57.00   Max.   :12.000   Max.   :2023  
##                                                                  
##     timediff            event_id    
##  Min.   :      30   E1000000:    1  
##  1st Qu.:    1427   E1000001:    1  
##  Median :    5269   E1000002:    1  
##  Mean   :   31502   E1000003:    1  
##  3rd Qu.:   18422   E1000004:    1  
##  Max.   :22331161   E1000005:    1  
##  NA's   :2445       (Other) :20745

THe only NAs are in the timediff column which is what we expect since any of the first observations won’t have a timediff.

Data Formatting

In order to get plots that have the same formatting as last years’ report we have to do a bit of data formatting. First we need to make sure we are including the same relevant species (some were ignored for last years’ report or grouped together)

Last years report had the following species

  • white-tailed deer
  • snowshoe hare
  • black bear
  • coyote
  • red squirrel
  • fisher
  • unknown
  • moose
  • lynx
  • spruce grouse
  • red fox
  • striped skunk
  • ruffed grouse
  • owl
  • grey wolf
  • domestic dog
  • cougar
  • raven
  • other
  • mule deer

And they grouped all humans except for staff as ‘Humans’. Let’s look at the species we have in the combined years of data and try to format it the same way

detections_merged %>% 
  
  # group by array and species
  group_by(species) %>% 
  summarise(n = n()) %>% 
  
  # have R print everything
  print(n = nrow(.))
## # A tibble: 42 × 2
##    species                 n
##    <fct>               <int>
##  1 Arctic hare             1
##  2 ATVer                  41
##  3 Beaver                  4
##  4 Black bear           1997
##  5 Caribou               115
##  6 Cougar                 37
##  7 Coyote               1285
##  8 Domestic dog           15
##  9 Elk                     1
## 10 Fisher                253
## 11 Grey jay               66
## 12 Grey wolf             224
## 13 Human                  16
## 14 Lynx                  525
## 15 Marten                220
## 16 Moose                 693
## 17 Mule deer               2
## 18 Other                  11
## 19 Other birds           219
## 20 Owl                    15
## 21 Raven                  10
## 22 Red fox               127
## 23 Red squirrel         2199
## 24 Ruffed grouse          90
## 25 Snowmobiler            16
## 26 Snowshoe hare        4571
## 27 Spruce grouse          93
## 28 Staff                 462
## 29 Striped skunk          64
## 30 Unknown               657
## 31 Unknown canid          76
## 32 Unknown deer          340
## 33 Unknown mustelid       70
## 34 Unknown ungulate       34
## 35 White-tailed deer    6141
## 36 Canada goose            4
## 37 Hunter                  1
## 38 Long-tailed weasel     17
## 39 Otter                   7
## 40 Porcupine               5
## 41 Short-tailed weasel    19
## 42 Wolverine               8

Hmmm there is one instance of an arctic hare, check that this isn’t meant to be a snowshoe hare and fix later if needed.

Now let’s create a new data frame (tibble) to work with for the OSM figure summaries specifically

I personally would lump all the unknown together and all the birds together but for the sake of consistency with last year’s figures we will remove the same entries and keep the birds separate, let’s create a vector of entries to drop

species_drop <- c('Staff',
                  'Unknown deer',
                  'Unknown ungulate',
                  'Unknown canid',
                  'Unknown mustelid',
                  'Other birds',
                  'Arctic hare')

# now we can create the new data frame with some changes consistent w/ choices made for 2021-2022
detections_merged <- detections_merged %>% 
  
  # for summarizing, lets lump all the recreational humans into "Humans"
  mutate(species = recode_factor(species, 
                                 "Snowmobiler" = "Human",
                                 "ATVer" = "Human",
                                 'Hunter' = 'Human')) %>% 
  
  # remove species we don't want to plot
  filter(!species %in% species_drop)

# look at data
str(detections_merged)
## tibble [19,549 × 8] (S3: tbl_df/tbl/data.frame)
##  $ array   : Factor w/ 6 levels "LU2","LU3","LU01",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ site    : Factor w/ 233 levels "LU2_03","LU2_05",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ species : Factor w/ 39 levels "Human","Arctic hare",..: 33 33 33 33 33 33 33 33 33 33 ...
##  $ datetime: POSIXct[1:19549], format: "2021-07-18 08:59:30" "2021-08-07 04:21:14" ...
##  $ month   : num [1:19549] 7 8 8 8 8 8 8 9 9 9 ...
##  $ year    : num [1:19549] 2021 2021 2021 2021 2021 ...
##  $ timediff: num [1:19549] NA 28519 13230 4290 7337 ...
##  $ event_id: Factor w/ 20751 levels "E1000000","E1000001",..: 3 4 5 6 7 8 9 10 11 12 ...

Subset data

We will also want to subset the data by landscape unit (LU) and generate a new data frame for each LU to use for plotting

I’m not great at writing loops, so let’s see how this shit goes… probably bad but who knows

array_frames <- list()

for (i in unique(detections_merged$array)){
  
   #Subset data based on radius
  df <- detections_merged %>%
    filter(array == i)
  
 # list of dataframes
  array_frames <- c(array_frames, list(df))
  
 
}

# inspect one data frame
print(array_frames[[1]]) # this is for LU2
## # A tibble: 4,592 × 8
##    array site   species        datetime            month  year timediff event_id
##    <fct> <fct>  <fct>          <dttm>              <dbl> <dbl>    <dbl> <fct>   
##  1 LU2   LU2_03 White-tailed … 2021-07-18 08:59:30     7  2021      NA  E1000002
##  2 LU2   LU2_03 White-tailed … 2021-08-07 04:21:14     8  2021   28519. E1000003
##  3 LU2   LU2_03 White-tailed … 2021-08-16 08:51:21     8  2021   13230. E1000004
##  4 LU2   LU2_03 White-tailed … 2021-08-19 08:21:29     8  2021    4290. E1000005
##  5 LU2   LU2_03 White-tailed … 2021-08-24 10:39:23     8  2021    7337. E1000006
##  6 LU2   LU2_03 White-tailed … 2021-08-26 09:17:02     8  2021    2797. E1000007
##  7 LU2   LU2_03 White-tailed … 2021-08-31 19:13:33     8  2021    7796  E1000008
##  8 LU2   LU2_03 White-tailed … 2021-09-10 05:03:31     9  2021   13550. E1000009
##  9 LU2   LU2_03 White-tailed … 2021-09-16 17:10:41     9  2021    9363. E1000010
## 10 LU2   LU2_03 White-tailed … 2021-09-16 19:19:24     9  2021     127  E1000011
## # ℹ 4,582 more rows

… I think this worked

Now let’s change names of list items using purrr, couldn’t figure out how to name them in the loop, you don’t necessarily need to do this because we change the names in the next section, but I like having things named

array_frames <- array_frames %>%

  purrr::set_names('LU02',
                   'LU03',
                   'LU01',
                   'LU13',
                   'LU15',
                   'LU21') 

# inspect each data frame
head(array_frames$LU01)
## # A tibble: 6 × 8
##   array site    species        datetime            month  year timediff event_id
##   <fct> <fct>   <fct>          <dttm>              <dbl> <dbl>    <dbl> <fct>   
## 1 LU01  LU01_06 White-tailed … 2022-06-18 11:09:19     6  2022      NA  E2      
## 2 LU01  LU01_06 White-tailed … 2022-07-10 13:56:10     7  2022   31847. E3      
## 3 LU01  LU01_06 White-tailed … 2022-07-25 11:04:44     7  2022   21429. E4      
## 4 LU01  LU01_06 White-tailed … 2022-07-31 06:38:06     7  2022    8356. E5      
## 5 LU01  LU01_06 White-tailed … 2022-08-01 09:45:28     8  2022    1627. E6      
## 6 LU01  LU01_06 White-tailed … 2022-08-01 15:51:01     8  2022     364. E7

Detection plots

Detection data

Now we can apply the same data formatting for each LUs’ data frame using purrr.

We want to count the number of independent detections per species per LU to use in the detection plots

# apply the same formatting to each LU data frame using purrr map
detection_data <- array_frames %>% 
  
  purrr::map(
    ~.x %>% 
      
      # group by species
      group_by(species) %>%
      
      # calculate a column with unique accounts of each species
      mutate(count = n_distinct(event_id)) %>% 
      
      # keep just the columns we need
      select(species, count) %>% 
      
      # keep only unique (distinct) rows so we should be left with one row per species, this helps with plotting later if you don't do it ggplot will try to count and plot each row it's annoying
      distinct()) %>% 
  
  # set names of list objects
  purrr::set_names('Detections LU02',
                   'Detections LU03',
                   'Detections LU01',
                   'Detections LU13',
                   'Detections LU15',
                   'Detections LU21')

Detection plots

Now to graph independent detections for each LU using purrr, this avoids a TON of code repetition needed to plot each one individually

We use purrr::imap() instead of purrr::map() because imap maintains the variable names in our list (e.g. Detections LU01, Detections LU13, etc.) which we can then use to title each plot.

Within purrr::imap() we just paste the code we would use for a single ggplot since all the graphical elements (except the title which we change with the file name [.y]) are the same

# create object detection plots which uses the detection_data list (w/ all 4 LUs)
detection_plots <- detection_data %>% 
  
  # use imap instead of map as it allows us to use .y to paste the list element names as the plot titles later
  purrr::imap(
    ~.x %>% 
      
      # now just copy and paste the ggplot code for the detection graphs
      ggplot(.,
             aes(x = reorder(species, count), y = count)) +
      
      # plot as bar graph using geom_col so we don't have to provide a y aesthetic
      geom_col() +
      
      # switch the x and y axis
      coord_flip() +
      
      # add the number of detections at the end of each bar
      geom_text(aes(label = count),
                color = "black",
                size = 3,
                hjust = -0.3,
                vjust = 0.2) +
      
      # label x and y axis with informative titles
      labs(x = 'Species',
           y = 'Number of Independent (30 min) Detections') +
      
      # add title to plot with LU name the .y will take the name of whatever you named each list element in the detection_data list, so make sure this name is what you want on the ggtitle
      ggtitle(.y) +
      
      # set the theme
      theme_classic() +
      theme(plot.title = element_text(hjust = 0.5)))

# view plots, this will print each in it's own window so you have to scroll back in the plot viewer pane to look at each one
detection_plots
## $`Detections LU02`

## 
## $`Detections LU03`

## 
## $`Detections LU01`

## 
## $`Detections LU13`

## 
## $`Detections LU15`

## 
## $`Detections LU21`

Save detection plots

Now we want to save these plots in case we need each individual one (we will combine the detection and naive occ plots into a single figure for each LU later and use those for the OSM report, but we may want these standalone plots later so let’s save them while they are here).

We can save all the plots from the purrr iteration above using purrr::imap. imap is used instead of map because it allows us to retain the list object names (plot names) to paste as the file name with the .y command.

IMPORTANT if you are using this code for a future github repo, DO NOT use .tiff as the file extension. This will cause issues when trying to push any changes to the github repo as the files are too large to meet githubs requirements

# save plots only use if needed
purrr::imap(
  detection_plots,
  ~ggsave(.x,
             file = paste0("figures/OSM_",
                           .y,
                           '.jpg'), # avoid using .tiff extension in the github repo, those files are too large to push to origin
          dpi = 600,
          width = 11,
          height = 9,
          units = 'in'))

Naive occupancy

Data

We also need to alter the detection data a bit to use for naive occupancy plots.

We will use the individual LU detection data like we did before and use purrr::map() to apply the dame data formatting to all 4 data frames.

Here we want to calculate the total number of sites in each LU, the number of sites each species was detected at in each LU and then use both those numbers to calculate naive occupancy for each species in each LU

# First we need to alter the data frame a bit for these plots, let's create a data frame for each LU (I couldn't figure out how to do this without assigning individual data frames for each UGH)


# apply the same formatting to each data frame using purrr
occupancy_data <- array_frames %>% 
  
  purrr::map(
    ~.x %>% 
      
      # calculate the total number of sites for each LU
      mutate(total_sites = n_distinct(site)) %>% 
      
      # group by species to calculate the number of sites each spp occurred at
      group_by(species) %>% 
  
      # add columns to count the number of sites each spp occurred at and then the naive occupancy
  reframe(count = n_distinct(site),
          naive_occ = count/total_sites,
          ind_det = n_distinct(event_id)) %>% 
  
    # keep just the columns we need
  select(species, naive_occ, ind_det) %>% 
  
    # keep only unique (distinct) rows so we should be left with one row per species, this helps with plotting
  distinct()) %>% 
  
  purrr::set_names('Naive Occupancy LU02',
                   'Naive Occupancy LU03',
                   'Naive Occupancy LU01',
                   'Naive Occupancy LU13',
                   'Naive Occupancy LU15',
                   'Naive Occupancy LU21')

Occupancy plots

Now we can graph naive occupancy for each LU using purrr, and as with the detection plots this saves a massive amount of coding using purrr to run an iteration on the data files and produce four plots at once instead of copying and pasting code for each individually

# create object occupancy_plots which uses the occupancy_data list (w/ all 4 LUs)
occupancy_plots <- occupancy_data %>% 
  
  # use imap instead of map as it allows us to use .y to paste the list element names as the plot titles later
  purrr::imap(
    ~.x %>% 

      # now just copy and paste the ggplot code for the occupancy graphs
      ggplot(.,
             aes(x = fct_reorder(species,
                                 ind_det), # this reorders the species so they match the order of the detection plot which makes it better for viewing when the plots are arranged together in 1 figure for each LU
                 y = naive_occ)) +
      
      # plot as bars using geom_col() which uses stat = 'identity', instead of geom_bar() which will count the rows in each group and plot that instead of naive occ
      geom_col() +
      
      # flip x and y axis 
      coord_flip() +
      
      # add text to end of bars that provides naive occ value
      geom_text(aes(label = round(naive_occ, 2)), 
                size = 3, 
                hjust = -0.3, 
                vjust = 0.2) +
      
      # relabel x and y axis and title
      labs(x = 'Species',
           y = 'Proportion of Sites With At Least One Detection') +
      
      # set plot title using .y (name of list object)
      ggtitle(.y) +
      
      # set. theme elements
      theme_classic()+
      theme(plot.title = element_text(hjust = 0.5)))

# view plots
occupancy_plots
## $`Naive Occupancy LU02`

## 
## $`Naive Occupancy LU03`

## 
## $`Naive Occupancy LU01`

## 
## $`Naive Occupancy LU13`

## 
## $`Naive Occupancy LU15`

## 
## $`Naive Occupancy LU21`

Save occupancy plots

As with the detection plots, we might want these individual plots later for something so we can use purrr::imap() to save them to the figures folder

Again avoid using the .tiff extension in github

# save plots 
purrr::imap(
  occupancy_plots,
  ~ggsave(.x,
          file = paste0("figures/OSM_",
                        .y,
                        '.jpg'), # avoid using .tiff extension in the github repo, those files are too large to push to origin
          dpi = 600,
          width = 11,
          height = 9,
          units = 'in'))

Final combined plots for report

The previous year’s report had a figure for each LU with the detections plot on the top and the occupancy plot on the bottom so we will recreate these for this year using ggarrange().

Unfortunately I could not figure out how to do this in purrr to reduce coding but luckily it isn’t too much repetition

# not sure I know how to do the following section in purrr just yet, but we've saved a ton of coding so far and it doesn't take much to arrange each of these individually

# LU2
LU02_det_occ_plots <- ggarrange(detection_plots$`Detections LU02`, 
                               occupancy_plots$`Naive Occupancy LU02`,
                               labels = c("A", "B"),
                               nrow = 2)

# view plot
LU02_det_occ_plots

# LU3
LU03_det_occ_plots <- ggarrange(detection_plots$`Detections LU03`, 
                               occupancy_plots$`Naive Occupancy LU03`,
                               labels = c("A", "B"),
                               nrow = 2)

# view plot
LU03_det_occ_plots

# LU1

# arrange the plots so each LU has a figure with detections on top and naive occ on bottom
LU01_det_occ_plots <- ggarrange(detection_plots$`Detections LU01`, 
                               occupancy_plots$`Naive Occupancy LU01`,
                               labels = c("A", "B"),
                               nrow = 2)

# view plot
LU01_det_occ_plots

# LU13

# arrange the plots so each LU has a figure with detections on top and naive occ on bottom
LU13_det_occ_plots <- ggarrange(detection_plots$`Detections LU13`, 
                                occupancy_plots$`Naive Occupancy LU13`,
                               labels = c("A", "B"),
                               nrow = 2)

# view plot
LU13_det_occ_plots

# LU15

# arrange the plots so each LU has a figure with detections on top and naive occ on bottom
LU15_det_occ_plots <- ggarrange(detection_plots$`Detections LU15`, 
                                occupancy_plots$`Naive Occupancy LU15`,
                                labels = c("A", "B"),
                                nrow = 2)

# view plot
LU15_det_occ_plots

# LU21

# arrange the plots so each LU has a figure with detections on top and naive occ on bottom
LU21_det_occ_plots <- ggarrange(detection_plots$`Detections LU21`, 
                                occupancy_plots$`Naive Occupancy LU21`,
                                labels = c("A", "B"),
                                nrow = 2)

# view plot
LU21_det_occ_plots

We can however, save all the figures again using purrr

# save all figures at once using purrr
final_det_occ_plots <- list(LU02_det_occ_plots,
                            LU03_det_occ_plots,
                            LU01_det_occ_plots,
                            LU13_det_occ_plots,
                            LU15_det_occ_plots,
                            LU21_det_occ_plots) %>% 
  

  purrr::set_names('LU02_det_occ_plots',
                   'LU03_det_occ_plots',
                   'LU01_det_occ_plots',
                   'LU13_det_occ_plots',
                   'LU15_det_occ_plots',
                   'LU21_det_occ_plots') %>% 
  
  purrr::imap(
    ~ggsave(.x,
            file = paste0("figures/OSM_",
                          .y,
                          '.jpg'), # avoid using .tiff extension in the github repo, those files are too large to push to origin
            dpi = 600,
            width = 12,
            height = 15,
            units = 'in'))

Finish with detection data

Before proceeding let’s clear the objects currently in our environment since we don’t need them for the analysis

rm(list = ls(all.names = TRUE))

Analysis prep

Now we can start the analysis prep.

First we need to read in the proportional detection (response metrics) and covariate (explanatory metrics) data files for all 6 LUs (fiscal years 2021-2022 and 2022-2023)

Response metrics

Read in data files

We haven’t joined the response metrics from 2021-2022 and 2022-2023 in any previous scripts yet, so let’s read those in using purrr and then join them and take a look at the data to make sure it looks good.

# response metric (proportional detections from the from the ACME_camera_script_9-2-2024.R or .Rmd)

prop_detections <-  file.path('data/processed',
                         
                         c('OSM_proportional_detections_2021.csv',
                           'OSM_proportional_detections_2022.csv')) %>% 
  
  map(~.x %>%
        read_csv(.,
                 
                  # set the column types to read in correctly
                 col_types = cols(site = col_factor(),
                                  .default = col_number()))) %>% 
  
  # give names to each data frame in list
  purrr::set_names('prop_dets_2021',
                   'prop_dets_2022') # R doesn't like when they are just numbers, you can make it work but it's annoying to call the data frame later so I've called them prop_dets_year

# check variable structure
str(prop_detections)
## List of 2
##  $ prop_dets_2021: spc_tbl_ [78 × 23] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ site                    : Factor w/ 78 levels "LU2_03","LU2_05",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ black_bear              : num [1:78] 2 2 1 3 2 3 4 3 1 2 ...
##   ..$ coyote                  : num [1:78] 1 0 3 5 6 3 8 0 3 4 ...
##   ..$ fisher                  : num [1:78] 2 0 3 1 0 2 1 1 1 0 ...
##   ..$ snowshoe_hare           : num [1:78] 1 0 2 5 6 9 14 0 7 3 ...
##   ..$ white-tailed_deer       : num [1:78] 7 3 6 7 6 7 9 6 5 13 ...
##   ..$ cougar                  : num [1:78] 0 1 0 0 0 0 1 0 0 1 ...
##   ..$ lynx                    : num [1:78] 0 0 3 2 0 4 6 0 1 0 ...
##   ..$ red_fox                 : num [1:78] 0 0 3 0 0 0 0 1 0 0 ...
##   ..$ moose                   : num [1:78] 0 0 0 0 1 0 3 3 0 1 ...
##   ..$ grey_wolf               : num [1:78] 0 0 0 0 0 0 0 0 1 0 ...
##   ..$ caribou                 : num [1:78] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ absent_black_bear       : num [1:78] 3 3 4 2 3 8 7 2 4 9 ...
##   ..$ absent_coyote           : num [1:78] 6 7 4 2 1 11 6 7 4 10 ...
##   ..$ absent_fisher           : num [1:78] 5 7 4 6 7 12 13 6 6 14 ...
##   ..$ absent_snowshoe_hare    : num [1:78] 6 7 5 2 1 5 0 7 0 11 ...
##   ..$ absent_white-tailed_deer: num [1:78] 0 4 1 0 1 7 5 1 2 1 ...
##   ..$ absent_cougar           : num [1:78] 7 6 7 7 7 14 13 7 7 13 ...
##   ..$ absent_lynx             : num [1:78] 7 7 4 5 7 10 8 7 6 14 ...
##   ..$ absent_red_fox          : num [1:78] 7 7 4 7 7 14 14 6 7 14 ...
##   ..$ absent_moose            : num [1:78] 7 7 7 7 6 14 11 4 7 13 ...
##   ..$ absent_grey_wolf        : num [1:78] 7 7 7 7 7 14 14 7 6 14 ...
##   ..$ absent_caribou          : num [1:78] 7 7 7 7 7 14 14 7 7 14 ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   .default = col_number(),
##   .. ..   site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
##   .. ..   black_bear = col_number(),
##   .. ..   coyote = col_number(),
##   .. ..   fisher = col_number(),
##   .. ..   snowshoe_hare = col_number(),
##   .. ..   `white-tailed_deer` = col_number(),
##   .. ..   cougar = col_number(),
##   .. ..   lynx = col_number(),
##   .. ..   red_fox = col_number(),
##   .. ..   moose = col_number(),
##   .. ..   grey_wolf = col_number(),
##   .. ..   caribou = col_number(),
##   .. ..   absent_black_bear = col_number(),
##   .. ..   absent_coyote = col_number(),
##   .. ..   absent_fisher = col_number(),
##   .. ..   absent_snowshoe_hare = col_number(),
##   .. ..   `absent_white-tailed_deer` = col_number(),
##   .. ..   absent_cougar = col_number(),
##   .. ..   absent_lynx = col_number(),
##   .. ..   absent_red_fox = col_number(),
##   .. ..   absent_moose = col_number(),
##   .. ..   absent_grey_wolf = col_number(),
##   .. ..   absent_caribou = col_number()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr> 
##  $ prop_dets_2022: spc_tbl_ [154 × 25] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##   ..$ site                    : Factor w/ 154 levels "LU01_06","LU01_10",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ black_bear              : num [1:154] 7 3 4 7 8 9 4 5 7 7 ...
##   ..$ coyote                  : num [1:154] 4 4 8 10 11 9 11 0 9 4 ...
##   ..$ fisher                  : num [1:154] 4 3 3 3 2 1 1 2 0 3 ...
##   ..$ moose                   : num [1:154] 3 2 5 9 1 0 2 4 1 0 ...
##   ..$ snowshoe_hare           : num [1:154] 4 1 3 0 8 2 2 0 12 4 ...
##   ..$ white-tailed_deer       : num [1:154] 12 5 12 12 13 14 15 9 12 10 ...
##   ..$ cougar                  : num [1:154] 0 0 1 0 1 0 0 0 0 0 ...
##   ..$ grey_wolf               : num [1:154] 0 0 2 0 0 0 1 0 0 0 ...
##   ..$ lynx                    : num [1:154] 0 0 1 0 1 1 0 0 0 2 ...
##   ..$ red_fox                 : num [1:154] 0 0 2 0 0 0 0 0 4 0 ...
##   ..$ wolverine               : num [1:154] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ caribou                 : num [1:154] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ absent_black_bear       : num [1:154] 5 3 8 5 4 3 8 7 5 5 ...
##   ..$ absent_coyote           : num [1:154] 10 1 6 5 3 5 4 15 6 11 ...
##   ..$ absent_fisher           : num [1:154] 10 2 11 12 12 13 14 13 15 12 ...
##   ..$ absent_moose            : num [1:154] 11 3 9 6 13 14 13 11 14 15 ...
##   ..$ absent_snowshoe_hare    : num [1:154] 10 4 11 15 6 12 13 15 3 11 ...
##   ..$ absent_white-tailed_deer: num [1:154] 2 0 2 3 1 0 0 6 3 5 ...
##   ..$ absent_cougar           : num [1:154] 14 5 13 15 13 14 15 15 15 15 ...
##   ..$ absent_grey_wolf        : num [1:154] 14 5 12 15 14 14 14 15 15 15 ...
##   ..$ absent_lynx             : num [1:154] 14 5 13 15 13 13 15 15 15 13 ...
##   ..$ absent_red_fox          : num [1:154] 14 5 12 15 14 14 15 15 11 15 ...
##   ..$ absent_wolverine        : num [1:154] 14 5 14 15 14 14 15 15 15 15 ...
##   ..$ absent_caribou          : num [1:154] 14 5 14 15 14 14 15 15 15 15 ...
##   ..- attr(*, "spec")=
##   .. .. cols(
##   .. ..   .default = col_number(),
##   .. ..   site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
##   .. ..   black_bear = col_number(),
##   .. ..   coyote = col_number(),
##   .. ..   fisher = col_number(),
##   .. ..   moose = col_number(),
##   .. ..   snowshoe_hare = col_number(),
##   .. ..   `white-tailed_deer` = col_number(),
##   .. ..   cougar = col_number(),
##   .. ..   grey_wolf = col_number(),
##   .. ..   lynx = col_number(),
##   .. ..   red_fox = col_number(),
##   .. ..   wolverine = col_number(),
##   .. ..   caribou = col_number(),
##   .. ..   absent_black_bear = col_number(),
##   .. ..   absent_coyote = col_number(),
##   .. ..   absent_fisher = col_number(),
##   .. ..   absent_moose = col_number(),
##   .. ..   absent_snowshoe_hare = col_number(),
##   .. ..   `absent_white-tailed_deer` = col_number(),
##   .. ..   absent_cougar = col_number(),
##   .. ..   absent_grey_wolf = col_number(),
##   .. ..   absent_lynx = col_number(),
##   .. ..   absent_red_fox = col_number(),
##   .. ..   absent_wolverine = col_number(),
##   .. ..   absent_caribou = col_number()
##   .. .. )
##   ..- attr(*, "problems")=<externalptr>

Merge data files

Now we need to merge the two data files for analysis. We can do this with dplyr.

# merge the proportional detections files so there are rows for both fiscal years
prop_dets_all <- dplyr::bind_rows(prop_detections$prop_dets_2021,
                                  prop_detections$prop_dets_2022)

print(prop_dets_all)
## # A tibble: 232 × 25
##    site  black_bear coyote fisher snowshoe_hare `white-tailed_deer` cougar  lynx
##    <fct>      <dbl>  <dbl>  <dbl>         <dbl>               <dbl>  <dbl> <dbl>
##  1 LU2_…          2      1      2             1                   7      0     0
##  2 LU2_…          2      0      0             0                   3      1     0
##  3 LU2_…          1      3      3             2                   6      0     3
##  4 LU2_…          3      5      1             5                   7      0     2
##  5 LU2_…          2      6      0             6                   6      0     0
##  6 LU2_…          3      3      2             9                   7      0     4
##  7 LU2_…          4      8      1            14                   9      1     6
##  8 LU2_…          3      0      1             0                   6      0     0
##  9 LU2_…          1      3      1             7                   5      0     1
## 10 LU2_…          2      4      0             3                  13      1     0
## # ℹ 222 more rows
## # ℹ 17 more variables: red_fox <dbl>, moose <dbl>, grey_wolf <dbl>,
## #   caribou <dbl>, absent_black_bear <dbl>, absent_coyote <dbl>,
## #   absent_fisher <dbl>, absent_snowshoe_hare <dbl>,
## #   `absent_white-tailed_deer` <dbl>, absent_cougar <dbl>, absent_lynx <dbl>,
## #   absent_red_fox <dbl>, absent_moose <dbl>, absent_grey_wolf <dbl>,
## #   absent_caribou <dbl>, wolverine <dbl>, absent_wolverine <dbl>

Format data

This looks good except since there were no wolverines in the first fiscal year of monitoring (LU02 and LU03) those columns have NAs for both arrays, we want to replace those NAs with Zeros and move the wolverine column to the correct location

Let’s do that now.

prop_dets_all <- prop_dets_all %>% 
  
   # replace NAs introduced from joining data to zeros
  replace(is.na(.),
          0) %>% 
  
  # relocate wolverine column
  relocate(.,
           wolverine,
           .after = caribou)

# check data
head(prop_dets_all)
## # A tibble: 6 × 25
##   site   black_bear coyote fisher snowshoe_hare `white-tailed_deer` cougar  lynx
##   <fct>       <dbl>  <dbl>  <dbl>         <dbl>               <dbl>  <dbl> <dbl>
## 1 LU2_03          2      1      2             1                   7      0     0
## 2 LU2_05          2      0      0             0                   3      1     0
## 3 LU2_1…          1      3      3             2                   6      0     3
## 4 LU2_1…          3      5      1             5                   7      0     2
## 5 LU2_1…          2      6      0             6                   6      0     0
## 6 LU2_1…          3      3      2             9                   7      0     4
## # ℹ 17 more variables: red_fox <dbl>, moose <dbl>, grey_wolf <dbl>,
## #   caribou <dbl>, wolverine <dbl>, absent_black_bear <dbl>,
## #   absent_coyote <dbl>, absent_fisher <dbl>, absent_snowshoe_hare <dbl>,
## #   `absent_white-tailed_deer` <dbl>, absent_cougar <dbl>, absent_lynx <dbl>,
## #   absent_red_fox <dbl>, absent_moose <dbl>, absent_grey_wolf <dbl>,
## #   absent_caribou <dbl>, absent_wolverine <dbl>

Looks good!

Covariates

In the previous script, 2_ACME_landscape_covariate_exploration_script.Rmd we joined the two fiscal years of data and grouped the covariates for analysis and saved this data as a c.sv file, so we can read in this file now and we shouldn’t have to do any further formatting at the moment

Read data

We will check the data structure after reading in the file just to make sure everything looks good.

covariates_all <- read_csv('data/processed/OSM_covariates_grouped_2021_2022.csv',
                           
                           # set the column types to read in correctly
                           col_types = cols(array = col_factor(),
                                            camera = col_factor(),
                                            site = col_factor(),
                                            buff_dist = col_factor(),
                                            .default = col_number()))
## Warning: The following named parsers don't match the column names: camera
str(covariates_all)
## spc_tbl_ [4,660 × 19] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ array             : Factor w/ 6 levels "LU13","LU15",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ site              : Factor w/ 233 levels "LU13_18","LU13_15",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ buff_dist         : Factor w/ 20 levels "250","500","750",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ harvest           : num [1:4660] 0 0 0.687 0.337 0 ...
##  $ pipeline          : num [1:4660] 0 0.068 0 0 0.0301 ...
##  $ roads             : num [1:4660] 0 0.0174 0 0 0 ...
##  $ seismic_lines     : num [1:4660] 0 0.03277 0 0.00889 0.01144 ...
##  $ seismic_lines_3D  : num [1:4660] 0 0 0 0 0.0523 ...
##  $ trails            : num [1:4660] 0.00588 0.0028 0 0.01591 0 ...
##  $ transmission_lines: num [1:4660] 0.0642 0 0 0 0.091 ...
##  $ veg_edges         : num [1:4660] 0 0.0858 0 0 0 ...
##  $ wells             : num [1:4660] 0 0 0 0 0.0322 ...
##  $ lc_grassland      : num [1:4660] 0.193 0.348 0 0 0.178 ...
##  $ lc_coniferous     : num [1:4660] 0.456 0.358 0.186 1 0.822 ...
##  $ lc_broadleaf      : num [1:4660] 0 0 0 0 0 ...
##  $ lc_mixed          : num [1:4660] 0 0.101 0.255 0 0 ...
##  $ lc_developed      : num [1:4660] 0 0.0916 0 0 0 ...
##  $ lc_shrub          : num [1:4660] 0.316 0 0.559 0 0 ...
##  $ osm_industrial    : num [1:4660] 0.383 0.157 0 0 0 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   .default = col_number(),
##   ..   array = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
##   ..   site = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
##   ..   buff_dist = col_factor(levels = NULL, ordered = FALSE, include_na = FALSE),
##   ..   harvest = col_number(),
##   ..   pipeline = col_number(),
##   ..   roads = col_number(),
##   ..   seismic_lines = col_number(),
##   ..   seismic_lines_3D = col_number(),
##   ..   trails = col_number(),
##   ..   transmission_lines = col_number(),
##   ..   veg_edges = col_number(),
##   ..   wells = col_number(),
##   ..   lc_grassland = col_number(),
##   ..   lc_coniferous = col_number(),
##   ..   lc_broadleaf = col_number(),
##   ..   lc_mixed = col_number(),
##   ..   lc_developed = col_number(),
##   ..   lc_shrub = col_number(),
##   ..   osm_industrial = col_number()
##   .. )
##  - attr(*, "problems")=<externalptr>

Everything looks good!

###Subset data by buffer

We need to subset the data so we have separate data frames for each buffer width to work with in the analysis AND to explore correlation between variables at each buffer width, as these may very with spatial scales

Let’s use a for loop to subset the data

buffer_frames <- list()

for (i in unique(covariates_all$buff_dist)){
  
  print(i)
  
  # Subset data based on radius
  df <- covariates_all %>%
    filter(buff_dist == i)
  
  # list of dataframes
  buffer_frames <-c (buffer_frames, list(df))
}
## [1] "250"
## [1] "500"
## [1] "750"
## [1] "1000"
## [1] "1250"
## [1] "1500"
## [1] "1750"
## [1] "2000"
## [1] "2250"
## [1] "2500"
## [1] "2750"
## [1] "3000"
## [1] "3250"
## [1] "3500"
## [1] "3750"
## [1] "4000"
## [1] "4250"
## [1] "4500"
## [1] "4750"
## [1] "5000"
# name list objects so we can extract names for plotting 

buffer_frames <- buffer_frames %>% 
  
  # absurdly long way to do this but for sake of time fuck it
  purrr::set_names('250 meter buffer',
                   '500 meter buffer',
                   '750 meter buffer',
                   '1000 meter buffer',
                   '1250 meter buffer',
                   '1500 meter buffer',
                   '1750 meter buffer',
                   '2000 meter buffer',
                   '2250 meter buffer',
                   '2500 meter buffer',
                   '2750 meter buffer',
                   '3000 meter buffer',
                   '3250 meter buffer',
                   '3500 meter buffer',
                   '3750 meter buffer',
                   '4000 meter buffer',
                   '4250 meter buffer',
                   '4500 meter buffer',
                   '4750 meter buffer',
                   '5000 meter buffer')

Now we have a list with data frames for each buffer width which we can work with later.

Add response metric

Now that we have the covariate data formatted we need to add the response metric (monthly proportional presence/absence) to the data frames

osm_final_df_2021_2022 <- buffer_frames %>% 
  
  purrr::map(
    ~.x %>% 
      
      left_join(prop_dets_all,
                by = 'site'))

Finish with data formatting

Let’s remove the objects we no longer need from the environment to keep our workspace clean

rm(covariates_all,
   prop_detections,
   df,
   i)

Analysis

Now we are going to run a global model which includes all HFI and LC variables that at first glance (will do a more thorough check later) seem to have enough data to include as covariates for each buffer width, and then we will compare these models see which buffer width best fit the data for each species. After that we will optimize models so they don’t includes any variables that are highly correlated.

We don’t need to do ALL the species since many don’t have enough data.

Refer to the 1_ACME_camera_script_9-2-2024.html or .Rmd the plot for proportional monthly detections should provide info on which species we have enough data for, can be found under Response metrics/3.Proportion monthly detections

A brief look at this fig indicates that we have enough for all the mammals in the prop_detections data frame except

  • cougar
  • wolverine
  • caribou??? (may have enough, may not)

Black bear

there is probably a way to shorten the following code to select particular species, I saw Andrew’s for loop in the draft script he wrote but couldn’t quite figure out how to adapt it to my purposes with the data formatted the way I have it, so I did this instead, maybe we can merge approaches later to clean this up if deemed necessary? But it certainly functions for now and is understandable… I think.

Global models

Let’s start with bears and use purrr to create a global model for every buffer distance

Recall purrr::map() is magical for iterations and will apply all the functions within the map() function to each item of the list supplied before the the map() function.

# create models for black bears at each buffer size
black_bear_mods <- osm_final_df_2021_2022 %>%

  # use purrr map to fun the same functions for all buffer sizes ((all objects in list))
  purrr::map(
    ~.x %>%

      # glmmTMB function let's us run the proportional binomial model using cbind to combine the present and absent columns for each species
     glmmTMB::glmmTMB(cbind(black_bear, absent_black_bear) ~
                        
                        # HFI
                        scale(harvest) +
                        scale(pipeline) +
                        scale(roads) +
                        scale(seismic_lines) +
                        scale(seismic_lines_3D) +
                        scale(trails) +
                        scale(transmission_lines) +
                        scale(veg_edges) +
                        scale(wells) +
                        scale(osm_industrial) +
                        
                        # VEG covariates in numerical order
                        scale(lc_grassland) +
                        scale(lc_coniferous) +
                        scale(lc_broadleaf) +
                        scale(lc_mixed) +
                        scale(lc_developed) +
                        scale(lc_shrub) +
                        
                        # Random effect of array
                        (1|array),
                   data = .,
                   family = 'binomial'))

Model selection

We will use the model.sel() function from the MuMIn package to compare the global models for each buffer width and see which buffer fits the black bear data best

model.sel(black_bear_mods)
## Model selection table 
##                   cnd((Int)) dsp((Int)) cnd(scl(hrv)) cnd(scl(lc_brd))
## 250 meter buffer     -0.5998          +      0.012140           0.1221
## 1750 meter buffer    -0.5987          +      0.008720           0.3954
## 2000 meter buffer    -0.5976          +      0.005699           0.4518
## 1250 meter buffer    -0.5996          +      0.008946           0.3455
## 1500 meter buffer    -0.6002          +      0.016130           0.3382
## 1000 meter buffer    -0.5983          +     -0.001997           0.4015
## 5000 meter buffer    -0.6074          +      0.047290           0.1153
## 2250 meter buffer    -0.5960          +     -0.001043           0.4491
## 2750 meter buffer    -0.5959          +      0.005900           0.4160
## 2500 meter buffer    -0.5954          +      0.001508           0.4295
## 3750 meter buffer    -0.6021          +      0.024810           0.3098
## 4000 meter buffer    -0.6037          +      0.035760           0.3021
## 3500 meter buffer    -0.6004          +      0.023550           0.3449
## 3250 meter buffer    -0.5989          +      0.024520           0.3668
## 4750 meter buffer    -0.6059          +      0.037070           0.1895
## 4250 meter buffer    -0.6040          +      0.040600           0.2960
## 3000 meter buffer    -0.5967          +      0.011800           0.3862
## 500 meter buffer     -0.6006          +      0.029270           0.2784
## 4500 meter buffer    -0.6048          +      0.037590           0.2450
## 750 meter buffer     -0.5978          +      0.022290           0.3135
##                   cnd(scl(lc_cnf)) cnd(scl(lc_dvl)) cnd(scl(lc_grs))
## 250 meter buffer           0.01706        -0.086020         0.067700
## 1750 meter buffer          0.28860        -0.149500        -0.001904
## 2000 meter buffer          0.37870        -0.137500         0.004301
## 1250 meter buffer          0.18040        -0.146700        -0.007598
## 1500 meter buffer          0.18560        -0.166800        -0.010420
## 1000 meter buffer          0.25690        -0.112100         0.021270
## 5000 meter buffer         -0.05003        -0.101400         0.093600
## 2250 meter buffer          0.39070        -0.153800        -0.004265
## 2750 meter buffer          0.35480        -0.169300         0.014090
## 2500 meter buffer          0.37300        -0.152200        -0.006222
## 3750 meter buffer          0.17330        -0.159600         0.034770
## 4000 meter buffer          0.15840        -0.140500         0.061200
## 3500 meter buffer          0.22620        -0.155000         0.028350
## 3250 meter buffer          0.27570        -0.156100         0.024090
## 4750 meter buffer          0.03390        -0.103500         0.088520
## 4250 meter buffer          0.14460        -0.117100         0.074410
## 3000 meter buffer          0.31280        -0.166900         0.015930
## 500 meter buffer           0.15610        -0.001471         0.108800
## 4500 meter buffer          0.08703        -0.113800         0.082690
## 750 meter buffer           0.18220        -0.048030         0.044580
##                   cnd(scl(lc_mxd)) cnd(scl(lc_shr)) cnd(scl(osm_ind))
## 250 meter buffer          0.054090          0.11220          0.046220
## 1750 meter buffer         0.150400          0.17960         -0.023060
## 2000 meter buffer         0.195400          0.22580         -0.030690
## 1250 meter buffer         0.133300          0.15110          0.026870
## 1500 meter buffer         0.114200          0.14190          0.004238
## 1000 meter buffer         0.184100          0.17820         -0.002263
## 5000 meter buffer         0.008365          0.04685         -0.077250
## 2250 meter buffer         0.205400          0.23040         -0.045190
## 2750 meter buffer         0.189300          0.21420         -0.071020
## 2500 meter buffer         0.197600          0.22710         -0.056540
## 3750 meter buffer         0.093200          0.14440         -0.106300
## 4000 meter buffer         0.078050          0.13870         -0.085970
## 3500 meter buffer         0.112300          0.16360         -0.106600
## 3250 meter buffer         0.133400          0.17710         -0.106000
## 4750 meter buffer         0.044460          0.08694         -0.084660
## 4250 meter buffer         0.071410          0.14580         -0.070480
## 3000 meter buffer         0.167100          0.19230         -0.090640
## 500 meter buffer          0.114000          0.14040          0.023410
## 4500 meter buffer         0.056320          0.11480         -0.080600
## 750 meter buffer          0.138500          0.11560          0.035930
##                   cnd(scl(ppl)) cnd(scl(rds)) cnd(scl(ssm_lns_3D))
## 250 meter buffer      4.783e-02     -0.104000             -0.11560
## 1750 meter buffer    -7.498e-02      0.158300             -0.11170
## 2000 meter buffer    -9.514e-02      0.176000             -0.10060
## 1250 meter buffer     9.031e-03     -0.006562             -0.13600
## 1500 meter buffer    -2.918e-02      0.073920             -0.12820
## 1000 meter buffer    -9.271e-03      0.017020             -0.12980
## 5000 meter buffer    -4.227e-01      0.005939             -0.08420
## 2250 meter buffer    -9.128e-02      0.202100             -0.09823
## 2750 meter buffer    -1.476e-01      0.273700             -0.08796
## 2500 meter buffer    -1.030e-01      0.245000             -0.09330
## 3750 meter buffer    -1.610e-01      0.117700             -0.08006
## 4000 meter buffer    -2.101e-01      0.062950             -0.06989
## 3500 meter buffer    -1.497e-01      0.174000             -0.07614
## 3250 meter buffer    -1.592e-01      0.241900             -0.07557
## 4750 meter buffer    -3.426e-01      0.037880             -0.07104
## 4250 meter buffer    -2.148e-01      0.031390             -0.06951
## 3000 meter buffer    -1.587e-01      0.273800             -0.08137
## 500 meter buffer      7.892e-05     -0.133900             -0.10010
## 4500 meter buffer    -2.709e-01      0.042870             -0.06702
## 750 meter buffer      2.574e-02     -0.074510             -0.12570
##                   cnd(scl(ssm_lns)) cnd(scl(trl)) cnd(scl(trn_lns))
## 250 meter buffer           -0.03550       0.10260         -0.129200
## 1750 meter buffer          -0.18240       0.07213         -0.092160
## 2000 meter buffer          -0.17810       0.07224         -0.078630
## 1250 meter buffer          -0.13410       0.08228         -0.128800
## 1500 meter buffer          -0.16190       0.07507         -0.114800
## 1000 meter buffer          -0.12410       0.09118         -0.114300
## 5000 meter buffer          -0.13330       0.01758          0.224000
## 2250 meter buffer          -0.17130       0.07448         -0.093590
## 2750 meter buffer          -0.16380       0.05180         -0.067080
## 2500 meter buffer          -0.16810       0.06390         -0.086760
## 3750 meter buffer          -0.15790       0.07173         -0.004969
## 4000 meter buffer          -0.15410       0.06589          0.029670
## 3500 meter buffer          -0.15900       0.06431         -0.033120
## 3250 meter buffer          -0.15790       0.05717         -0.044450
## 4750 meter buffer          -0.14610       0.03566          0.154500
## 4250 meter buffer          -0.16110       0.05966          0.052200
## 3000 meter buffer          -0.15550       0.04647         -0.055050
## 500 meter buffer           -0.04670       0.05432         -0.104000
## 4500 meter buffer          -0.15530       0.05068          0.096340
## 750 meter buffer           -0.06456       0.04651         -0.107200
##                   cnd(scl(veg_edg)) cnd(scl(wll)) df   logLik  AICc delta
## 250 meter buffer          -0.009170       -0.0136 18 -443.782 926.8  0.00
## 1750 meter buffer         -0.047050        0.1401 18 -444.092 927.4  0.62
## 2000 meter buffer         -0.062020        0.1507 18 -444.158 927.5  0.75
## 1250 meter buffer          0.033960        0.1370 18 -444.226 927.7  0.89
## 1500 meter buffer         -0.008584        0.1543 18 -444.304 927.8  1.04
## 1000 meter buffer          0.027230        0.1196 18 -444.336 927.9  1.11
## 5000 meter buffer         -0.102300        0.4047 18 -444.816 928.8  2.07
## 2250 meter buffer         -0.058860        0.1602 18 -445.030 929.3  2.50
## 2750 meter buffer         -0.099740        0.1762 18 -445.409 930.0  3.25
## 2500 meter buffer         -0.093760        0.1587 18 -445.417 930.0  3.27
## 3750 meter buffer         -0.052440        0.2391 18 -445.548 930.3  3.53
## 4000 meter buffer         -0.053510        0.2714 18 -445.553 930.3  3.54
## 3500 meter buffer         -0.069670        0.2051 18 -445.859 930.9  4.15
## 3250 meter buffer         -0.096130        0.1884 18 -445.949 931.1  4.33
## 4750 meter buffer         -0.086190        0.3327 18 -446.118 931.4  4.67
## 4250 meter buffer         -0.062220        0.2681 18 -446.193 931.6  4.82
## 3000 meter buffer         -0.100500        0.1798 18 -446.234 931.7  4.90
## 500 meter buffer           0.024600        0.0979 18 -446.296 931.8  5.03
## 4500 meter buffer         -0.073450        0.2919 18 -446.364 931.9  5.16
## 750 meter buffer          -0.003772        0.1360 18 -447.921 935.1  8.28
##                   weight
## 250 meter buffer   0.159
## 1750 meter buffer  0.117
## 2000 meter buffer  0.109
## 1250 meter buffer  0.102
## 1500 meter buffer  0.094
## 1000 meter buffer  0.091
## 5000 meter buffer  0.057
## 2250 meter buffer  0.046
## 2750 meter buffer  0.031
## 2500 meter buffer  0.031
## 3750 meter buffer  0.027
## 4000 meter buffer  0.027
## 3500 meter buffer  0.020
## 3250 meter buffer  0.018
## 4750 meter buffer  0.015
## 4250 meter buffer  0.014
## 3000 meter buffer  0.014
## 500 meter buffer   0.013
## 4500 meter buffer  0.012
## 750 meter buffer   0.003
## Models ranked by AICc(x) 
## Random terms (all models): 
##   cond(1 | array)

Looks like the smallest buffer (250) fits the data best for black bears.

Let’s look at this model closer

Model summary

summary(black_bear_mods$`250 meter buffer`)
##  Family: binomial  ( logit )
## Formula:          
## cbind(black_bear, absent_black_bear) ~ scale(harvest) + scale(pipeline) +  
##     scale(roads) + scale(seismic_lines) + scale(seismic_lines_3D) +  
##     scale(trails) + scale(transmission_lines) + scale(veg_edges) +  
##     scale(wells) + scale(osm_industrial) + scale(lc_grassland) +  
##     scale(lc_coniferous) + scale(lc_broadleaf) + scale(lc_mixed) +  
##     scale(lc_developed) + scale(lc_shrub) + (1 | array)
## Data: .
## 
##      AIC      BIC   logLik deviance df.resid 
##    923.6    985.6   -443.8    887.6      214 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  array  (Intercept) 0.05453  0.2335  
## Number of obs: 232, groups:  array, 6
## 
## Conditional model:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -0.59976    0.10675  -5.618 1.93e-08 ***
## scale(harvest)             0.01214    0.05072   0.239   0.8109    
## scale(pipeline)            0.04783    0.07103   0.673   0.5007    
## scale(roads)              -0.10401    0.09070  -1.147   0.2515    
## scale(seismic_lines)      -0.03550    0.05305  -0.669   0.5034    
## scale(seismic_lines_3D)   -0.11563    0.05881  -1.966   0.0493 *  
## scale(trails)              0.10263    0.04771   2.151   0.0315 *  
## scale(transmission_lines) -0.12923    0.06773  -1.908   0.0564 .  
## scale(veg_edges)          -0.00917    0.07788  -0.118   0.9063    
## scale(wells)              -0.01360    0.05402  -0.252   0.8012    
## scale(osm_industrial)      0.04622    0.05292   0.873   0.3825    
## scale(lc_grassland)        0.06770    0.09942   0.681   0.4959    
## scale(lc_coniferous)       0.01706    0.26332   0.065   0.9483    
## scale(lc_broadleaf)        0.12207    0.18033   0.677   0.4985    
## scale(lc_mixed)            0.05409    0.15802   0.342   0.7321    
## scale(lc_developed)       -0.08602    0.09897  -0.869   0.3847    
## scale(lc_shrub)            0.11218    0.14259   0.787   0.4314    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Nothing looks fishy in the model summary for now, we will look at this more closely once we have a true top model.

Subset models

Autocorrelation 250m

Before we can develop model subsets we need to see what variables can be included in the same model at this buffer width.

Let’s use the chart.Correlation() function in the Performance Analytics package to look at this.

buffer_frames$`250 meter buffer` %>% 
  
  select_if(is.numeric) %>% 
  
   # use chart.correlation 
      chart.Correlation(.,
                        histogram = TRUE, 
                        method = "pearson")

mtext('250 meter buffer', side = 3, line = 3)

List of correlated variables:

  • pipeline & transmission_lines 0.53
  • roads & veg_edges 0.71
  • roads & lc_developed 0.57

Model list

Jake/Andrew can you suggest some additional subset models if needed

Let’s create another global model without these correlated variables. I’m going to select transmission_lines over pipeline because the summary from earlier showed transmission lines had larger effect on black bear presence, and I’m going to choose to keep roads.

# global model w/ non-correlated variables
bear_global_250 <-  glmmTMB::glmmTMB(cbind(black_bear, absent_black_bear) ~
                        
                        # HFI
                        scale(harvest) +
                        scale(roads) +
                        scale(seismic_lines) +
                        scale(seismic_lines_3D) +
                        scale(trails) +
                        scale(transmission_lines) +
                        scale(wells) +
                        scale(osm_industrial) +
                        
                        # VEG covariates in numerical order
                        scale(lc_grassland) +
                        scale(lc_coniferous) +
                        scale(lc_broadleaf) +
                        scale(lc_mixed) +
                        scale(lc_shrub) +
                        
                        # Random effect of array
                        (1|array),
                   data = osm_final_df_2021_2022$`250 meter buffer`,
                   family = 'binomial')

# null model to compare
bear_null_250 <- glmmTMB::glmmTMB(cbind(black_bear, absent_black_bear) ~ 1 +
                        
                        # Random effect of array
                        (1|array),
                   data = osm_final_df_2021_2022$`250 meter buffer`,
                   family = 'binomial')

# second model is based on linear features providing easier movement through boreal forest
bear_linear_250 <- glmmTMB::glmmTMB(cbind(black_bear, absent_black_bear) ~
                        
                        # HFI
                        scale(roads) +
                        scale(seismic_lines) +
                        scale(seismic_lines_3D) +
                        scale(trails) +
                        scale(transmission_lines) +
                        
                        # Random effect of array
                        (1|array),
                   data = osm_final_df_2021_2022$`250 meter buffer`,
                   family = 'binomial')

Model selection

Now let’s look at a model selection table with our subset models.

model.sel(bear_global_250,
          bear_null_250,
          bear_linear_250)
## Model selection table 
##                 cnd((Int)) dsp((Int)) cnd(scl(hrv)) cnd(scl(lc_brd))
## bear_linear_250    -0.5986          +                               
## bear_global_250    -0.5995          +       0.01223           0.2326
## bear_null_250      -0.5884          +                               
##                 cnd(scl(lc_cnf)) cnd(scl(lc_grs)) cnd(scl(lc_mxd))
## bear_linear_250                                                   
## bear_global_250           0.1823           0.1248           0.1455
## bear_null_250                                                     
##                 cnd(scl(lc_shr)) cnd(scl(osm_ind)) cnd(scl(rds))
## bear_linear_250                                          -0.1586
## bear_global_250           0.1891           0.04922       -0.1215
## bear_null_250                                                   
##                 cnd(scl(ssm_lns_3D)) cnd(scl(ssm_lns)) cnd(scl(trl))
## bear_linear_250              -0.1311          -0.04436       0.10250
## bear_global_250              -0.1135          -0.04271       0.09811
## bear_null_250                                                       
##                 cnd(scl(trn_lns)) cnd(scl(wll)) df   logLik  AICc delta weight
## bear_linear_250          -0.09901                7 -449.674 913.8  0.00  0.972
## bear_global_250          -0.10520      -0.01348 15 -444.383 921.0  7.14  0.027
## bear_null_250                                    2 -463.090 930.2 16.38  0.000
## Models ranked by AICc(x) 
## Random terms (all models): 
##   cond(1 | array)

Looks like the linear feature model is best, SO FAR

Top model

  • 250 meter buffer

Summary

summary(bear_linear_250)
##  Family: binomial  ( logit )
## Formula:          
## cbind(black_bear, absent_black_bear) ~ scale(roads) + scale(seismic_lines) +  
##     scale(seismic_lines_3D) + scale(trails) + scale(transmission_lines) +  
##     (1 | array)
## Data: osm_final_df_2021_2022$`250 meter buffer`
## 
##      AIC      BIC   logLik deviance df.resid 
##    913.3    937.5   -449.7    899.3      225 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  array  (Intercept) 0.05426  0.2329  
## Number of obs: 232, groups:  array, 6
## 
## Conditional model:
##                           Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -0.59856    0.10644  -5.624 1.87e-08 ***
## scale(roads)              -0.15864    0.05694  -2.786  0.00534 ** 
## scale(seismic_lines)      -0.04436    0.05006  -0.886  0.37556    
## scale(seismic_lines_3D)   -0.13114    0.05635  -2.327  0.01995 *  
## scale(trails)              0.10247    0.04641   2.208  0.02724 *  
## scale(transmission_lines) -0.09901    0.05653  -1.751  0.07987 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Odds ratios

Let’s extract the odds ratios for the top model so we can plot them for data vis later.

bear_model_odds <- bear_linear_250 %>% 

  # extract the coefficients and upper and lower CI
      confint() %>% 
 
      # format resulting object as a tibble data frame
      as_tibble() %>%
  
      
      # add a column where we can put the feature names
      rowid_to_column() %>% 
      
      # rename the columns for plotting
      rename('lower' = `2.5 %`,
             'upper' = `97.5 %`,
             'estimate' = Estimate,
             'feature' = rowid) %>% 
      
      # rename the entries to features, need to look at the order the features are in from the model summary and ensure it matches
      mutate(feature = as.factor(feature),
             feature = recode(feature,
                              '1' = 'intercept',
                              '2' = 'roads',
                              '3' = 'seismic_lines',
                              '4' = 'seismic_lines_3D',
                              '5' = 'trails',
                              '6' = 'transmission_lines',
                              '7' = 'intercept_array')) %>% 
  
  # remove intercepts
  filter(!grepl('intercept',
                feature))

Plot odds ratios

First let’s get a silhouette for this graphy from phylopic

black_bear_img <- get_phylopic(get_uuid(name = 'Ursus americanus'))

Now let’s use ggplot to plot the odds ratios for each feature in the top model

# provide data and mapping aesthetics
ggplot(bear_model_odds, aes(x = feature, 
                          y = estimate)) +
  
   geom_errorbar(aes(ymin = lower, 
                     ymax = upper),
                width = 0.4,
                linewidth = 0.5,
                position = position_dodge(width = 0.9)) +
  
   geom_hline(yintercept = 0, linetype = "dashed") +
  
  labs(y = 'Odds ratio') +
  
  scale_x_discrete(labels = c('Roads',
                              'Seismic Lines',
                              '3D Seismic Lines',
                              'Trails',
                              'Transmission Lines')) +
  
  add_phylopic(black_bear_img, 
               x = 5.3,
               y = 0.2,
               ysize = 0.05) +
  
  theme_classic() +
  
  theme(axis.title.x = element_blank(),
        axis.text.x = element_text(angle = 90,
                                   hjust = 1),
        axis.title = element_text(size = 16),
        axis.text = element_text(size = 14))

Let’s repeat this process for each species that we have enough data for.

Caribou

We may or may not have enough data for caribou but let’s try it at least for this preliminary report

We can use the same code from black bears (above) to run global models for each buffer width except remember we want to remove 250 meters

And in the same chunk to save time let’s also run the model.sel() function

Models/Model selection

caribou_mods_no250 <- final_osm_2023_df %>%
  
  # remove 350 meter buffer width
  purrr::discard_at('250 meter buffer') %>% 
  
  # use purrr map to make global models for all other buffer sizes
  purrr::map(
    ~.x %>%

     glmmTMB::glmmTMB(cbind(caribou, absent_caribou) ~
                        # HFI covariates in alphabetical order
                        scale(borrowpits) +
                        scale(pipeline) +
                        scale(clearings) +
                        scale(harvest) +
                        scale(facilities) +
                        scale(mines) +
                        scale(roads) +
                        scale(seismic_lines) +
                        scale(transmission_lines) +
                        scale(trails) +
                         scale(veg_edges) +
                        scale(wells) +
                        
                        # VEG covariates in numerical order
                        scale(lc_class110) +
                        scale(lc_class20) +
                        scale(lc_class210) +
                        scale(lc_class220) +
                        scale(lc_class230) +
                        scale(lc_class33) +
                        scale(lc_class34) +
                        scale(lc_class50) +
                        (1|array),
                   data = .,
                   family = 'binomial'))

# model selection
model.sel(caribou_mods_no250)

We get a warning that there are some model convergence problems, I expect this is because we don’t have enough data for caribou but I don’t have time to dig into this now so we will investigate more closely for final analysis

For caribou 3500m buffer is top model for now

Model summary 3500m

Let’s take a closer look at the top model summary

summary(caribou_mods_no250$`3500 meter buffer`)

There’s nothing that catches my eye immediately as being sus about this particular model so it may not have been one with convergence issues. We will keep it in report for now

Coyote

Models/Model selection

coyote_mods_no250 <- final_osm_2023_df %>%
  
  # remove 350 meter buffer width
  purrr::discard_at('250 meter buffer') %>% 
  
  # use purrr map to make global models for all other buffer sizes
  purrr::map(
    ~.x %>%

     glmmTMB::glmmTMB(cbind(coyote, absent_coyote) ~
                        seismic_lines +
                        pipeline +
                        borrowpits +
                        wellsites +
                        roads +
                        trails + 
                        lc_class20 +
                        lc_class34 +
                        lc_class50 +
                        lc_class110 +
                        lc_class210 +
                        lc_class220 +
                        lc_class230 +
                        (1|array),
                   data = .,
                   family = 'binomial'))

# model selection
model.sel(coyote_mods_no250)

for coyote top model appears to be 4500 m by quite a bit

Model summary 4500m

Let’s get the model summary for this model

summary(coyote_mods_no250$`4500 meter buffer`)

There is one lc class with a very high estimate and SE which seems a bit sus to me

Fisher

Models/Model selection

fisher_mods_no250 <- final_osm_2023_df %>%
  
  # remove 350 meter buffer width
  purrr::discard_at('250 meter buffer') %>% 
  
  # use purrr map to make global models for all other buffer sizes
  purrr::map(
    ~.x %>%

     glmmTMB::glmmTMB(cbind(fisher, absent_fisher) ~
                        seismic_lines +
                        pipeline +
                        borrowpits +
                        wellsites +
                        roads +
                        trails + 
                        lc_class20 +
                        lc_class34 +
                        lc_class50 +
                        lc_class110 +
                        lc_class210 +
                        lc_class220 +
                        lc_class230 +
                        (1|array),
                   data = .,
                   family = 'binomial'))

# model selection
model.sel(fisher_mods_no250)

For fisher top model is 2000 meter

Model summary 2000m

Let’s print the summary for this model

summary(fisher_mods_no250$`2000 meter buffer`)

Again lc_class34 has a very high standard error, we may not have enough data in this landcover class to use in the final analysis

Grey wolf

Models/Model selection

wolf_mods_no250 <- final_osm_2023_df %>%
  
  # remove 350 meter buffer width
  purrr::discard_at('250 meter buffer') %>% 
  
  # use purrr map to make global models for all other buffer sizes
  purrr::map(
    ~.x %>%

     glmmTMB::glmmTMB(cbind(grey_wolf, absent_grey_wolf) ~
                        seismic_lines +
                        pipeline +
                        borrowpits +
                        wellsites +
                        roads +
                        trails + 
                        lc_class20 +
                        lc_class34 +
                        lc_class50 +
                        lc_class110 +
                        lc_class210 +
                        lc_class220 +
                        lc_class230 +
                        (1|array),
                   data = .,
                   family = 'binomial'))

# model selection
model.sel(wolf_mods_no250)

For grey wolf top model is 4500 m buffer

Model summary 4500m

Let’s get the model summary for this model

summary(wolf_mods_no250$`4500 meter buffer`)

lc_class34 still presenting some issues, interesting that seismic lines weren’t significant and have a negative estimate

Lynx

Models/Model selection

lynx_mods_no250 <- final_osm_2023_df %>%
  
  # remove 350 meter buffer width
  purrr::discard_at('250 meter buffer') %>% 
  
  # use purrr map to make global models for all other buffer sizes
  purrr::map(
    ~.x %>%

     glmmTMB::glmmTMB(cbind(lynx, absent_lynx) ~
                        seismic_lines +
                        pipeline +
                        borrowpits +
                        wellsites +
                        roads +
                        trails + 
                        lc_class20 +
                        lc_class34 +
                        lc_class50 +
                        lc_class110 +
                        lc_class210 +
                        lc_class220 +
                        lc_class230 +
                        (1|array),
                   data = .,
                   family = 'binomial'))

# model selection
model.sel(lynx_mods_no250)

For lynx the top model is the 1000 m buffer

Model summary 1000m

Let’s get the model summary

summary(lynx_mods_no250$`1000 meter buffer`)

Moose

Models/Model selection

moose_mods_no250 <- final_osm_2023_df %>%
  
  # remove 350 meter buffer width
  purrr::discard_at('250 meter buffer') %>% 
  
  # use purrr map to make global models for all other buffer sizes
  purrr::map(
    ~.x %>%

     glmmTMB::glmmTMB(cbind(moose, absent_moose) ~
                        seismic_lines +
                        pipeline +
                        borrowpits +
                        wellsites +
                        roads +
                        trails + 
                        lc_class20 +
                        lc_class34 +
                        lc_class50 +
                        lc_class110 +
                        lc_class210 +
                        lc_class220 +
                        lc_class230 +
                        (1|array),
                   data = .,
                   family = 'binomial'))

# model selection
model.sel(moose_mods_no250)

For moose the top model is the 3750 m buffer

Model summary 3750m

Let’s get the model summary

summary(moose_mods_no250$`3750 meter buffer`)

Red fox

Models/Model selection

fox_mods_no250 <- final_osm_2023_df %>%
  
  # remove 350 meter buffer width
  purrr::discard_at('250 meter buffer') %>% 
  
  # use purrr map to make global models for all other buffer sizes
  purrr::map(
    ~.x %>%

     glmmTMB::glmmTMB(cbind(red_fox, absent_red_fox) ~
                        seismic_lines +
                        pipeline +
                        borrowpits +
                        wellsites +
                        roads +
                        trails + 
                        lc_class20 +
                        lc_class34 +
                        lc_class50 +
                        lc_class110 +
                        lc_class210 +
                        lc_class220 +
                        lc_class230 +
                        (1|array),
                   data = .,
                   family = 'binomial'))

# model selection
model.sel(fox_mods_no250)

For red fox the top model is 3750 m buffer

Model summary 3750m

Let’s get the model summary

summary(fox_mods_no250$`3750 meter buffer`)

Gagh! Borrow pits does not have a reasonable estimate and SE

White-tailed deer

Models/Model selection

deer_mods_no250 <- final_osm_2023_df %>%
  
  # remove 350 meter buffer width
  purrr::discard_at('250 meter buffer') %>% 
  
  # use purrr map to make global models for all other buffer sizes
  purrr::map(
    ~.x %>%

      # have to include the `` around the white-tailed_deer or R won't recognize it as a variable because of the -
     glmmTMB::glmmTMB(cbind(`white-tailed_deer`, `absent_white-tailed_deer`) ~
                        seismic_lines +
                        pipeline +
                        borrowpits +
                        wellsites +
                        roads +
                        trails + 
                        lc_class20 +
                        lc_class34 +
                        lc_class50 +
                        lc_class110 +
                        lc_class210 +
                        lc_class220 +
                        lc_class230 +
                        (1|array),
                   data = .,
                   family = 'binomial'))

# model selection
model.sel(deer_mods_no250)

For deer the top model was also the 3750 buffer

Model summary 3750m

Let’s get the model summary

summary(deer_mods_no250$`3750 meter buffer`)

Coefficient plots

What we need to do now is extract the coefficient estimates from each top model, as well as the 95% CI so we can plot them for easier visualization and understanding of the data

Extract coefficients

The confint() function will extract the coefficients and CI intervals from a model, so what we need to do is make a list of all the models, then use the map() function in purrr to apply the confint() function to all the models and get the coefficients. We want this to result in a tibble that has a column for the HFI feature (we aren’t plotting the lc_class data for this report), the upper and lower CI, and the coefficient estimate.

In order to do this we have to do a bit of data wrangling, currently this isn’t the most pleasing way to accomplish the desired outcome, but it works.

# This is also a dog shit way to do this but I need to get this done

# make a list of the top models for each species
top_models <- list(black_bear_mods_no250$`500 meter buffer`,
                   caribou_mods_no250$`1250 meter buffer`,
                   coyote_mods_no250$`4500 meter buffer`,
                   fisher_mods_no250$`2000 meter buffer`,
                   wolf_mods_no250$`4500 meter buffer`,
                   lynx_mods_no250$`1000 meter buffer`,
                   moose_mods_no250$`3750 meter buffer`,
                   fox_mods_no250$`3750 meter buffer`,
                   deer_mods_no250$`3750 meter buffer`) %>%  
  
  # pipe into purrr to create coefficient table for all models
  purrr::map(
    ~.x %>% 
      
      # extract the coefficients and upper and lower CI
      confint() %>% 
 
      # format resulting object as a tibble data frame
      as_tibble() %>%
      
      # subset to just the HFI variables for these plots
      slice_head(n = 6) %>% 
      
      # add a column where we can put the feature names
      rowid_to_column() %>% 
      
      # rename the columns for plotting
      rename('lower' = `2.5 %`,
             'upper' = `97.5 %`,
             'estimate' = Estimate,
             'feature' = rowid) %>% 
      
      # rename the entries to features, need to look at the order the features are in from the model summary and ensure it matches
      mutate(feature = as.factor(feature),
             feature = recode(feature,
                              '1' = 'seismic_lines',
                              '2' = 'pipeline',
                              '3' = 'borrowpits',
                              '4' = 'wellsites',
                              '5' = 'roads',
                              '6' = 'trails'))) %>% 
  
  # set the names of each resulting tibble data frame to the species name
  purrr::set_names('Black bear',
                   'Caribou',
                   'Coyote',
                   'Fisher',
                   'Grey wolf',
                   'Lynx',
                   'Moose',
                   'Red fox',
                   'White-tailed deer')

Merge data frames

Now we have a data frame with the bet coefficients for each species, but if we want these on a plot together we need them all in one data frame.

To merge data into one data frame we can use the list_rbind() function from the purrr package which will take each element of the list and stack them on top of one another just like rbind does with data frames, and if we use the names_to argument we can extract the names of the list elements and assign them to a column so we know which data comes from which species model (list element)

In this code I also add a new column called uuid which contains the image id (uuid) for a phylopic silhouette of each species that I may want to use for plotting

Phylopic.or is an open source online database of silhouettes various contributors have created for use. There is an R package that works with this data called rphylopic; we can use the get_uuid() function from this package to extract the data for a silhouette for each species we want, which is what I’ve done here.

# combine all list elements 
coeffs_df_all <- list_rbind(top_models,
                            names_to = 'species') %>% 
  
  # change species to a factor for plotting
  mutate(species = as.factor(species),
         
         # add phylopic uuid for each species for plotting 
         # the uuid is extracted using getuuid with the species name as name = ''
         uuid = case_when(species == 'Black bear' ~ get_uuid(name = 'Ursus americanus'),
                          species == 'Caribou' ~ get_uuid(name = 'Rangifer tarandus'),
                          species == 'Coyote' ~ get_uuid(name = 'Canis latrans'),
                          species == 'Fisher' ~ get_uuid(name = 'Pekania pennanti'),
                          species == 'Grey wolf' ~ get_uuid(name = 'Canis lupus'),
                          species == 'Lynx' ~ get_uuid(name = 'Lynx lynx'),
                          species == 'Moose' ~ get_uuid(name = 'Alces alces'),
                          species == 'Red fox' ~ get_uuid(name = 'Vulpes vulpes'),
                          species == 'White-taield deer' ~ get_uuid(name ='Odocoileus virginianus'))) 

Plot coefficients

Now let’s explore some different options to plot the coefficients

All species

Let’s try plotting all the species on one plot using ggplot()

# provide data and mapping aesthetics
ggplot(coeffs_df_all, aes(x = feature, 
                          y = estimate, 
                          group = uuid)) +
  
   geom_errorbar(aes(ymin = lower, 
                     ymax = upper, 
                     color = feature),
                width = 0.4,
                linewidth = 0.5,
                position = position_dodge(width = 0.9)) +
  
    # add points for each estimate for each covariate and use position = position_dodge to shift the points so all the species don't plot on top of one another
  geom_phylopic(aes(x = feature, 
                 y = estimate,
                 uuid = uuid),
             position = position_dodge(width = 0.9),
             size = 2)
# combine all list elements 
coeffs_df_all <- list_rbind(top_models,
                            names_to = 'species') %>% 
  
  # change species to a factor for plotting
  mutate(species = as.factor(species),
         
         # add phylopic uuid for each species for plotting 
         # the uuid is extracted using getuuid with the species name as name = ''
         uuid = case_when(species == 'Black bear' ~ get_uuid(name = 'Ursus americanus'),
                          species == 'Caribou' ~ get_uuid(name = 'Rangifer tarandus'),
                          species == 'Coyote' ~ get_uuid(name = 'Canis latrans'),
                          species == 'Fisher' ~ '735066c6-2f3e-4f97-acb1-06f55ae075c9',
                          species == 'Grey wolf' ~ get_uuid(name = 'Canis lupus'),
                          species == 'Lynx' ~ get_uuid(name = 'Lynx lynx'),
                          species == 'Moose' ~ '74eab34a-498c-4614-aece-f02361874f79',
                          species == 'Red fox' ~ '9c977769-bf1e-44d4-82ab-f9f93dce39ca',
                          species == 'White-taield deer' ~ '56f6fdb2-15d0-43b5-b13f-714f2cb0f5d0')) %>% 
  
  # need to remove problematic estimate which is going to skew plot since its so large compared to others
  filter(!c(species == 'Red fox' &
           feature == 'wellsites'))

After plotting the moose image I don’t like it, let’s manually replace it in the data

# I went on the phylopic website and saw there are three images for moose, I like the last one better so we will use it

get_uuid(name = 'Alces alces',
         n = 3)

get_uuid(name = 'Odocoileus virginianus',
         n = 3)

get_uuid(name = 'Pekania pennanti',
         n = 2)

get_uuid(name = 'Vulpes',
         n = 2)


# Then I manually copied this uuid and replaces it in the code above

Try plotting all

ggplot(coeffs_df_all, aes(x = feature, 
                          y = estimate, 
                          group = uuid)) +
  
   geom_errorbar(aes(ymin = lower, 
                     ymax = upper, 
                     color = feature),
                width = 0.4,
                linewidth = 0.5,
                position = position_dodge(width = 1.2)) +
  
    # add points for each estimate for each covariate and use position = position_dodge to shift the points so all the species don't plot on top of one another
  geom_phylopic(aes(x = feature, 
                 y = estimate,
                 uuid = uuid),
             position = position_dodge(width = 1.2),
             size = 2)
ggplot(coeffs, aes(x = feature, y = estimate)) +
  
  geom_point(size = 3, position = position_dodge(width = 0.5)) +
  
  geom_errorbar(aes(ymin = lower, ymax = upper),
                width = 0.4,
                linewidth = 1,
                position = position_dodge(width = 0.5)) +
  
  geom_hline(yintercept = 0, linetype = "dashed")+
  scale_color_manual(values = c("#56B4E9", "#009E73"), name = "Spatial Scale")+
  theme_classic()+
  ggtitle("Moose Response to Anthropogenic Disturbance Features")+
  ylab("Coefficient Estimate \n \u00B1 95% CI")+
  scale_x_discrete(labels =c("Borrowpits", "Harvest\nAreas", "Industrial\nSites", "Pipelines","Roads", "Seismic\nLines", "Trails", "Transmission\nLines"))+
    theme(axis.title.x = element_blank(),
        axis.text.x =  element_text(size = 12),
        axis.title.y = element_text(size = 14),
        legend.title = element_text(size = 12),
        plot.title = element_text(size = 15, hjust = 0.5))

If all else fails can use plot_model function